Adaptive Load Balancing in MapReduce using Flubber

نویسندگان

  • Rohit Paravastu
  • Rozemary Scarlat
  • Balakrishnan Chandrasekaran
چکیده

MapReduce has emerged as a successful framework for addressing the heavy demand for large-scale analytical data processing, in this peta-byte age. However, while on one hand the sheer size of data makes problems more challenging, the flexibility offered by the MapReduce frameworks on the other hand, makes the learning curve far steeper than expected. The general idea behind a MapReduce framework is to split the task into two components – a Mapper and a Reducer. The mapper executes a user-defined computation on chunks of data and generates the results, while the reducer groups the results together based on a common attribute. Scalability, hence, appears as an inherent trait of the design. A critical parameter in this configuration is the number of reducers required for a given task, and frameworks like Hadoop expect the user to specify this parameter while submitting a job. In this report, we focus on Hadoop and argue that deciding the number of reducers is a non-trivial task, let alone deciding it prior to running the job. To address this issue, we present Flubber – a simple pre-job that can be sandwiched between the original job and Hadoop. With a couple of parameters from the user, it takes a stab at figuring out the ideal number of reducers for the given job.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Comparative Study Load Balance Algorithms for Map Reduce environment

MapReduce is a famous model for data-intensive parallel computing in shared-nothing clusters. One of the main issues in MapReduce is the fact of depending its performance mainly on data distribution. MapReduce contains simple load balance technique based on FIFO job scheduler that serves the jobs in their submission order but unfortunately it is insufficient in real world cases as it missed man...

متن کامل

An Improved Technique Of Extracting Frequent Itemsets From Massive Data Using MapReduce

The mining of frequent itemsets is a basic and essential work in many data mining applications. Frequent itemsets extraction with frequent pattern and rules boosts the applications like Association rule mining, co-relations also in product sale and marketing. In extraction process of frequent itemsets there are number of algorithms used Like FP-growth,E-clat etc. But unfortunately these algorit...

متن کامل

Jumbo: Beyond MapReduce for Workload Balancing

Over the past decade several frameworks such as Google MapReduce have been developed that allow data processing with unprecedented scale due to their high scalability and fault tolerance. However, these systems provide both new and existing challenges for workload balancing that have not yet been fully explored. The MapReduce model in particular has some inherent limitations when it comes to wo...

متن کامل

ROUTE: run-time robust reducer workload estimation for MapReduce

MapReduce has become a popular model for large-scale data processing in recent years. Many works on MapReduce scheduling (e.g., load balancing and deadline-aware scheduling) have emphasized the importance of predicting workload received by individual reducers. However, because the input characteristics and user-specified map function of a given job are unknown to the MapReduce framework before ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010